AITopics | decentralized training

We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers.We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.

decentralized local stochastic extra-gradient, name change, variational inequality, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Decentralized Training of Foundation Models in Heterogeneous Environments

Neural Information Processing SystemsDec-24-2025, 21:53:45 GMT

Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often involving tens of thousands of GPUs running continuously for months. These models are typically trained in specialized clusters featuring fast, homogeneous interconnects and using carefully designed software systems that support both data parallelism and model/pipeline parallelism. Such dedicated clusters can be costly and difficult to obtain. Can we instead leverage the much greater amount of decentralized, heterogeneous, and lower-bandwidth interconnected compute? Previous works examining the heterogeneous, decentralized setting focus on relatively small models that can be trained in a purely data parallel manner.

decentralized training, foundation model, heterogeneous environment, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Communication Compression for Decentralized Training

Neural Information Processing SystemsNov-20-2025, 22:07:50 GMT

Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: {\em communication compression} for low bandwidth networks, and {\em decentralization} for high latency networks. In this paper, We explore a natural question: {\em can the combination of both techniques lead to a system that is robust to both bandwidth and latency?} Although the system implication of such combination is trivial, the underlying theoretical principle and algorithm design is challenging: unlike centralized algorithms, simply compressing {\rc exchanged information, even in an unbiased stochastic way, within the decentralized network would accumulate the error and cause divergence.} In this paper, we develop a framework of quantized, decentralized training and propose two different strategies, which we call {\em extrapolation compression} and {\em difference compression}. We analyze both algorithms and prove both converge at the rate of $O(1/\sqrt{nT})$ where $n$ is the number of workers and $T$ is the number of iterations, matching the convergence rate for full precision, centralized training. We validate our algorithms and find that our proposed algorithm outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with {\em both} high latency and low bandwidth.

communication compression, decentralized training, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

Corazza, Jan, Aria, Hadi Partovi, Kim, Hyohun, Neider, Daniel, Xu, Zhe

arXiv.org Artificial IntelligenceOct-20-2025

Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-032-06106-5_5

2506.07829

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

10493aa88605cad5ab4752b04a63d172-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 02:51:38 GMT

We gratefully appreciate the efforts made by all the reviewers. Hughes et al. [2018] extend the inequity aversion model and define a shaped reward These works aim to improve cooperation but cannot guarantee fairness. We compare against Hughes et al. [2018], More details will be included in the final version. To verify the effectiveness of the hierarchy, we use the hierarchy with other baselines in job scheduling. That demonstrates the effect of the hierarchy. The intuition of the fair-efficient reward is to maximize the resource utilization while punish the agent's utility deviation The main hyperparameters are contained in the Appendix, we will make a further supplement in the final version.

artificial intelligence, fair-efficient reward, hierarchy, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback

Decentralized Training of Foundation Models in Heterogeneous Environments

Neural Information Processing SystemsAug-17-2025, 08:46:10 GMT

Training foundation models, such as GPT -3 and PaLM, can be extremely expensive, often involving tens of thousands of GPUs running continuously for months.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Virginia (0.05)
North America > United States > Oregon (0.05)
(8 more...)

Genre: Research Report (0.67)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

Neural Information Processing SystemsAug-13-2025, 00:03:30 GMT

We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers.We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging.

artificial intelligence, decentralized local stochastic extra-gradient, machine learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Decentralized Training of Foundation Models in Heterogeneous Environments

Neural Information Processing SystemsMay-27-2025, 17:50:55 GMT

Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often involving tens of thousands of GPUs running continuously for months. These models are typically trained in specialized clusters featuring fast, homogeneous interconnects and using carefully designed software systems that support both data parallelism and model/pipeline parallelism. Such dedicated clusters can be costly and difficult to obtain. Can we instead leverage the much greater amount of decentralized, heterogeneous, and lower-bandwidth interconnected compute? Previous works examining the heterogeneous, decentralized setting focus on relatively small models that can be trained in a purely data parallel manner.

decentralized training, foundation model, heterogeneous environment, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.58)

Add feedback

Filters

Collaborating Authors

decentralized training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

10493aa88605cad5ab4752b04a63d172-AuthorFeedback.pdf

a37d615b61f999a5fa276adb14643476-Supplemental-Conference.pdf

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

Decentralized Training of Foundation Models in Heterogeneous Environments

Communication Compression for Decentralized Training

Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

10493aa88605cad5ab4752b04a63d172-AuthorFeedback.pdf

Decentralized Training of Foundation Models in Heterogeneous Environments

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

Decentralized Training of Foundation Models in Heterogeneous Environments